Model selection using Rademacher Penalization
نویسنده
چکیده
In this paper we describe the use of Rademacher penalization for model selection. As in Vapnik's Guaranteed Risk Minimization (GRM), Rademacher penalization attemps to balance the complexity of the model with its t to the data by minimizing the sum of the training error and a penalty term, which is an upper bound on the absolute di erence between the training error and the generalization error. However, while the GRM penalty is universal, the computation of the Rademacher penalty is data driven which means that it depends on the distribution of the data and hence one can expect better performance for particular instances of learning problems. We present experimental evidence that shows that Rademacher penalization can be used as an e ective method of model selection in learning problems. In particular we have shown that for the intervals model selection problem, Rademacher penalization outperforms GRM and cross validation (CV) over a wide range of sample sizes. Our experiments also show that the Rademacher penalty resembles more closely the behavior of the absolute di erence between generalization error and training error.
منابع مشابه
Model selection by resampling penalization
We present a new family of model selection algorithms based on the resampling heuristics. It can be used in several frameworks, do not require any knowledge about the unknown law of the data, and may be seen as a generalization of local Rademacher complexities and V fold cross-validation. In the case example of least-square regression on histograms, we prove oracle inequalities, and that these ...
متن کاملSelective Rademacher Penalization and Reduced Error Pruning of Decision Trees
Rademacher penalization is a modern technique for obtaining data-dependent bounds on the generalization error of classifiers. It appears to be limited to relatively simple hypothesis classes because of computational complexity issues. In this paper we, nevertheless, apply Rademacher penalization to the in practice important hypothesis class of unrestricted decision trees by considering the prun...
متن کاملMargin-adaptive model selection in statistical learning
A classical condition for fast learning rates is the margin condition, first introduced by Mammen and Tsybakov. We tackle in this paper the problem of adaptivity to this condition in the context of model selection, in a general learning framework. Actually, we consider a weaker version of this condition that allows one to take into account that learning within a small model can be much easier t...
متن کاملRademacher Complexity Bounds for a Penalized Multiclass Semi-Supervised Algorithm
We propose Rademacher complexity bounds for multiclass classifiers trained with a two-step semi-supervised model. In the first step, the algorithm partitions the partially labeled data and then identifies dense clusters containing κ predominant classes using the labeled training examples such that the proportion of their non-predominant classes is below a fixed threshold. In the second step, a ...
متن کاملDiscussion of “2004 Ims Medallion Lecture: Local Rademacher Complexities and Oracle Inequalities in Risk Minimization” by v. Koltchinskii
These last years, much attention has been paid to the construction of model selection criteria via penalization. Vladimir Koltchinskii has to be congratulated for providing a theory reaching a level of generality that is sufficiently high to recover most of the recent results obtained on this topic in the context of statistical learning. Thanks to concentration inequalities and empirical proces...
متن کامل